On Regression-Tree-Based Synthetic Data Methods for Business Data

نویسندگان

  • Joo Ho Lee
  • Yong Kim
  • Christine M. O’Keefe
چکیده

The challenge of balancing the competing objectives of allowing statistical analysis of confidential data and maintaining confidentiality is of great interest to national statistical agencies and other data custodians seeking to make their data available for research. This balance is often characterised as a trade-off between disclosure risk and data utility, where disclosure risk attempts to capture the probability of a data release resulting in a disclosure, while data utility attempts to capture some measure of the usefulness of the released data, see [6]. To date, most of the literature on addressing this balance has focussed on data about individuals, however, the same problem arises in the context of data about businesses and enterprises. It is the purpose of this paper to provide an empirical evaluation of existing methodology for individual data being applied to business data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Forest Stand Types Classification Using Tree-Based Algorithms and SPOT-HRG Data

Forest types mapping, is one of the most necessary elements in the forest management and silviculture treatments. Traditional methods such as field surveys are almost time-consuming and cost-intensive. Improvements in remote sensing data sources and classification –estimation methods are preparing new opportunities for obtaining more accurate forest biophysical attributes maps. This research co...

متن کامل

Modelling Customer Attraction Prediction in Customer Relation Management using Decision Tree: A Data Mining Approach

In Today’s quality- based competitive world, known as knowledge age, customer attraction is of ultimate importance. In respect to the slogan “customer is always right”, customer relation management is the core of an organizational strategy playing an important role in four aspects of customer identification, customer attraction, customer retaining, and customer satisfaction. Commercial organiza...

متن کامل

The application of data mining techniques in manipulated financial statement classification: The case of turkey

Predicting financially false statements to detect frauds in companies has an increasing trend in recent studies. The manipulations in financial statements can be discovered by auditors when related financial records and indicators are analyzed in depth together with the experience of auditors in order to create knowledge to develop a decision support system to classify firms. Auditors may annot...

متن کامل

A hybrid model based on machine learning and genetic algorithm for detecting fraud in financial statements

Financial statement fraud has increasingly become a serious problem for business, government, and investors. In fact, this threatens the reliability of capital markets, corporate heads, and even the audit profession. Auditors in particular face their apparent inability to detect large-scale fraud, and there are various ways to identify this problem. In order to identify this problem, the majori...

متن کامل

An Effective Tree-Based Algorithm for Ordinal Regression

Recently ordinal regression has attracted much interest in machine learning. The goal of ordinal regression is to assign each instance a rank, which should be as close as possible to its true rank. We propose an effective tree-based algorithm, called Ranking Tree, for ordinal regression. The main advantage of Ranking Tree is that it can group samples with closer ranks together in the process of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013